Conversation
|
@jiqing-feng Ran out of time today. We will check the two prs and test tomorrow and see if there is anything we should change. The only thing that I see that I may want to change is to make sure transformers/optimum not call any internal methods such as |
Signed-off-by: jiqing-feng <[email protected]>
Agree, I have integrated |
|
Hi @Qubitium . The optimum and transformers PR have been verified on CPU, do you mind verifying it on cuda? I always met building issues when I built gptqmodel from source. |
|
@jiqing-feng Ok. Can you show me your cuda compile errors? I want to check if related to our compiler flags and/or env. |
|
|
@CSY-ModelCloud I see a 404 urllib error. Caused by our whl download code? |
|
@jiqing-feng Please change the tranaformer and optimum Pr into draft mode until it passes tests. Right now it is not passing and some changes are required. |
Got it. |
|
@jiqing-feng biggest issue right now that is gptqmodel's internal format is gptq_v2 so directly using the quant-linear doesnt work for old quanted models such as thebloke or other gptq quantizers that use gptq v1. The fix is that gptqmodel needs to receive the full |
We are currently discussing how to best go about this with minimum changes. |
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
|
Unit tests #724 quantization and inference for GPTQ and GPTQ_v2 have been fixed. Created two PRs requesting to merge into jiqing-feng's branch: jiqing-feng/optimum, https://github.com/jiqing-feng/optimum/pull/1/files |
This PR enables transformers example.
For optimum lib, see: huggingface/optimum#2064
For transformers lib, see: huggingface/transformers#35012
Apply the 2 changes and this PR can run the example
transformers_usage.py